AITopics | different learning rate

Collaborating Authors

different learning rate

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SanityChecksforLotteryTickets: DoesYour WinningTicketReally WintheJackpot?

Neural Information Processing SystemsFeb-9-2026, 04:53:38 GMT

In recent years, the Lottery Ticket Hypothesis (LTH) [1] has drawn great attention and thorough researchefforts.

artificial intelligence, machine learning, ticket, (17 more...)

Neural Information Processing Systems

Genre: Contests & Prizes (0.43)

Industry: Leisure & Entertainment (0.43)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

234b941e88b755b7a72a1c1dd5022f30-Supplemental.pdf

Neural Information Processing SystemsFeb-7-2026, 19:45:08 GMT

That is,we optimize for theα and β hyperparameters while fixing theσ to a negligible amount (σ = 2 32 specifically).

artificial intelligence, machine learning, seefiguredescriptionabove, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.52)

Add feedback

Neuronal Fluctuations: Learning Rates vs Participating Neurons

Pareek, Darsh, Kumar, Umesh, Rao, Ruthu, Janjam, Ravi

arXiv.org Artificial IntelligenceNov-14-2025

Deep Neural Networks (DNNs) rely on inherent fluctuations in their internal parameters (weights and biases) to effectively navigate the complex optimization landscape and achieve robust performance. While these fluctuations are recognized as crucial for escaping local minima and improving generalization, their precise relationship with fundamental hyperparameters remains underexplored. A significant knowledge gap exists concerning how the learning rate, a critical parameter governing the training process, directly influences the dynamics of these neural fluctuations. This study systematically investigates the impact of varying learning rates on the magnitude and character of weight and bias fluctuations within a neural network. We trained a model using distinct learning rates and analyzed the corresponding parameter fluctuations in conjunction with the network's final accuracy. Our findings aim to establish a clear link between the learning rate's value, the resulting fluctuation patterns, and overall model performance. By doing so, we provide deeper insights into the optimization process, shedding light on how the learning rate mediates the crucial exploration-exploitation trade-off during training. This work contributes to a more nuanced understanding of hyperparameter tuning and the underlying mechanics of deep learning.

artificial intelligence, fluctuation, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2511.10435

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Supplementary material Contents

Neural Information Processing SystemsOct-2-2025, 11:27:05 GMT

We calculate the AIC for each of the 16 models fit.

artificial intelligence, learning rate, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

bb1443cc31d7396bf73e7858cea114e1-AuthorFeedback.pdf

Neural Information Processing SystemsAug-22-2025, 02:13:36 GMT

We will fix these issues in the revision.

baseline, experiment, raa, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.78)

Add feedback

A Proofs for Section 3

Neural Information Processing SystemsAug-15-2025, 19:40:39 GMT

The lemma is proven in Section D . First consider an even k . This together with ( 37) completes the proof of ( 23). C.1 Proof of Theorem 5 Recall we let a D.1 Proof of Lemma 1 We show the following more general result. The proof is a simple practice for linear algebra.

batch size, decomposition, different learning rate, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)

Add feedback

6a130f1dc6f0c829f874e92e5458dced-Paper.pdf

Neural Information Processing SystemsAug-14-2025, 23:38:32 GMT

There have been long-standing controversies and inconsistencies over the experiment setup and criteria for identifying the "winning ticket" in literature.

learning rate, lottery ticket hypothesis, ticket, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Michigan (0.04)

Genre: Contests & Prizes (0.83)

Industry: Leisure & Entertainment (0.65)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Communications > Networks (0.93)

Add feedback

Understanding the Evolution of the Neural Tangent Kernel at the Edge of Stability

Jiang, Kaiqi, Cohen, Jeremy, Li, Yuanzhi

arXiv.org Artificial IntelligenceJul-18-2025

The study of Neural Tangent Kernels (NTKs) in deep learning has drawn increasing attention in recent years. NTKs typically actively change during training and are related to feature learning. In parallel, recent work on Gradient Descent (GD) has found a phenomenon called Edge of Stability (EoS), in which the largest eigenvalue of the NTK oscillates around a value inversely proportional to the step size. However, although follow-up works have explored the underlying mechanism of such eigenvalue behavior in depth, the understanding of the behavior of the NTK eigenvectors during EoS is still missing. This paper examines the dynamics of NTK eigenvectors during EoS in detail. Across different architectures, we observe that larger learning rates cause the leading eigenvectors of the final NTK, as well as the full NTK matrix, to have greater alignment with the training target. We then study the underlying mechanism of this phenomenon and provide a theoretical analysis for a two-layer linear network. Our study enhances the understanding of GD training dynamics in deep learning.

alignment, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2507.12837

Country:

North America > United States (0.27)
Europe (0.27)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Additive Model Boosting: New Insights and Path(ologie)s

Schulte, Rickmer, Rügamer, David

arXiv.org Machine LearningMar-7-2025

Additive models (AMs) have sparked a lot of interest in machine learning recently, allowing the incorporation of interpretable structures into a wide range of model classes. Many commonly used approaches to fit a wide variety of potentially complex additive models build on the idea of boosting additive models. While boosted additive models (BAMs) work well in practice, certain theoretical aspects are still poorly understood, including general convergence behavior and what optimization problem is being solved when accounting for the implicit regularizing nature of boosting. In this work, we study the solution paths of BAMs and establish connections with other approaches for certain classes of problems. Along these lines, we derive novel convergence results for BAMs, which yield crucial insights into the inner workings of the method. While our results generally provide reassuring theoretical evidence for the practical use of BAMs, they also uncover some ``pathologies'' of boosting for certain additive model classes concerning their convergence behavior that require caution in practice. We empirically validate our theoretical findings through several numerical experiments.

convergence, linear model, new insight and path, (12 more...)

arXiv.org Machine Learning

2503.05538

Country:

North America > United States > California > San Francisco County > San Francisco (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report > New Finding (0.87)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback

A Hessian-informed hyperparameter optimization for differential learning rate

Xu, Shiyun, Bu, Zhiqi, Zhang, Yiliang, Barnett, Ian

arXiv.org Artificial IntelligenceJan-12-2025

Differential learning rate (DLR), a technique that applies different learning rates to different model parameters, has been widely used in deep learning and achieved empirical success via its various forms. For example, parameter-efficient fine-tuning (PEFT) applies zero learning rates to most parameters so as to significantly save the computational cost. At the core, DLR leverages the observation that different parameters can have different loss curvature, which is hard to characterize in general. We propose the Hessian-informed differential learning rate (Hi-DLR), an efficient approach that solves the hyperparameter optimization (HPO) of learning rates and captures the loss curvature for any model and optimizer adaptively. Given a proper grouping of parameters, we empirically demonstrate that Hi-DLR can improve the convergence by dynamically determining the learning rates during the training. Furthermore, we can quantify the influence of different parameters and freeze the less-contributing parameters, which leads to a new PEFT that automatically adapts to various tasks and models. Additionally, Hi-DLR also exhibits comparable performance on various full model training tasks.

artificial intelligence, hi-dlr, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2501.06954

Country:

Europe (0.46)
North America > United States (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback